Accurate segmentation of power lines in aerial images is essential to ensure the flight safety of aerial vehicles. Acquiring high-quality ground truth annotations for training a deep learning model is a laborious process. Therefore, developing algorithms that can leverage knowledge from labelled synthetic data to unlabelled real images is highly demanded. This process is studied in Unsupervised domain adaptation (UDA). Recent approaches to self-training have achieved remarkable performance in UDA for semantic segmentation, which trains a model with pseudo labels on the target domain. However, the pseudo labels are noisy due to a discrepancy in the two data distributions. We identify that context dependency is important for bridging this domain gap. Motivated by this, we propose QuadFormer, a novel framework designed for domain adaptive semantic segmentation. The hierarchical quadruple transformer combines cross-attention and self-attention mechanisms to adapt transferable context. Based on cross-attentive and self-attentive feature representations, we introduce a pseudo label correction scheme to online denoise the pseudo labels and reduce the domain gap. Additionally, we present two datasets - ARPLSyn and ARPLReal to further advance research in unsupervised domain adaptive powerline segmentation. Finally, experimental results indicate that our method achieves state-of-the-art performance for the domain adaptive power line segmentation on ARPLSyn$\rightarrow$TTTPLA and ARPLSyn$\rightarrow$ARPLReal.
translated by 谷歌翻译
现有的基于视频的人重新识别(REID)的方法主要通过功能提取器和功能聚合器来了解给定行人的外观特征。但是,当不同的行人外观相似时,外观模型将失败。考虑到不同的行人具有不同的步行姿势和身体比例,我们建议学习视频检索的外观功能之外的歧视性姿势功能。具体而言,我们实现了一个两分支的体系结构,以单独学习外观功能和姿势功能,然后将它们串联在一起进行推理。为了学习姿势特征,我们首先通过现成的姿势检测器检测到每个框架中的行人姿势,并使用姿势序列构建时间图。然后,我们利用复发图卷积网络(RGCN)来学习时间姿势图的节点嵌入,该姿势图设计了一种全局信息传播机制,以同时实现框内节点的邻域聚集,并在框架间图之间传递消息。最后,我们提出了一种由节点注意和时间注意的双重意见方法,以从节点嵌入中获得时间图表示,其中采用自我注意机制来了解每个节点和每个帧的重要性。我们在三个基于视频的REID数据集(即火星,Dukemtmc和Ilids-Vid)上验证了所提出的方法,其实验结果表明,学习的姿势功能可以有效地改善现有外观模型的性能。
translated by 谷歌翻译
高效的时空建模是视频动作识别的重要而挑战性问题。现有的最先进的方法利用相邻的特征差异,以获得短期时间建模的运动线索,简单的卷积。然而,只有一个本地卷积,由于接收领域有限而无法处理各种动作。此外,摄像机运动带来的动作耳鸣还将损害提取的运动功能的质量。在本文中,我们提出了一个时间显着积分(TSI)块,其主要包含突出运动激励(SME)模块和交叉感知时间集成(CTI)模块。具体地,中小企业旨在通过空间级局部 - 全局运动建模突出显示运动敏感区域,其中显着对准和金字塔型运动建模在相邻帧之间连续进行,以捕获由未对准背景引起的噪声较少的运动动态。 CTI旨在分别通过一组单独的1D卷积进行多感知时间建模。同时,不同看法的时间相互作用与注意机制相结合。通过这两个模块,通过引入有限的附加参数,可以有效地编码长短的短期时间关系。在几个流行的基准测试中进行了广泛的实验(即,某种东西 - 某种东西 - 东西 - 400,uCF-101和HMDB-51),这证明了我们所提出的方法的有效性。
translated by 谷歌翻译
The development of social media user stance detection and bot detection methods rely heavily on large-scale and high-quality benchmarks. However, in addition to low annotation quality, existing benchmarks generally have incomplete user relationships, suppressing graph-based account detection research. To address these issues, we propose a Multi-Relational Graph-Based Twitter Account Detection Benchmark (MGTAB), the first standardized graph-based benchmark for account detection. To our knowledge, MGTAB was built based on the largest original data in the field, with over 1.55 million users and 130 million tweets. MGTAB contains 10,199 expert-annotated users and 7 types of relationships, ensuring high-quality annotation and diversified relations. In MGTAB, we extracted the 20 user property features with the greatest information gain and user tweet features as the user features. In addition, we performed a thorough evaluation of MGTAB and other public datasets. Our experiments found that graph-based approaches are generally more effective than feature-based approaches and perform better when introducing multiple relations. By analyzing experiment results, we identify effective approaches for account detection and provide potential future research directions in this field. Our benchmark and standardized evaluation procedures are freely available at: https://github.com/GraphDetec/MGTAB.
translated by 谷歌翻译
Witnessing the impressive achievements of pre-training techniques on large-scale data in the field of computer vision and natural language processing, we wonder whether this idea could be adapted in a grab-and-go spirit, and mitigate the sample inefficiency problem for visuomotor driving. Given the highly dynamic and variant nature of the input, the visuomotor driving task inherently lacks view and translation invariance, and the visual input contains massive irrelevant information for decision making, resulting in predominant pre-training approaches from general vision less suitable for the autonomous driving task. To this end, we propose PPGeo (Policy Pre-training via Geometric modeling), an intuitive and straightforward fully self-supervised framework curated for the policy pretraining in visuomotor driving. We aim at learning policy representations as a powerful abstraction by modeling 3D geometric scenes on large-scale unlabeled and uncalibrated YouTube driving videos. The proposed PPGeo is performed in two stages to support effective self-supervised training. In the first stage, the geometric modeling framework generates pose and depth predictions simultaneously, with two consecutive frames as input. In the second stage, the visual encoder learns driving policy representation by predicting the future ego-motion and optimizing with the photometric error based on current visual observation only. As such, the pre-trained visual encoder is equipped with rich driving policy related representations and thereby competent for multiple visuomotor driving tasks. Extensive experiments covering a wide span of challenging scenarios have demonstrated the superiority of our proposed approach, where improvements range from 2% to even over 100% with very limited data. Code and models will be available at https://github.com/OpenDriveLab/PPGeo.
translated by 谷歌翻译
Increasing research interests focus on sequential recommender systems, aiming to model dynamic sequence representation precisely. However, the most commonly used loss function in state-of-the-art sequential recommendation models has essential limitations. To name a few, Bayesian Personalized Ranking (BPR) loss suffers the vanishing gradient problem from numerous negative sampling and predictionbiases; Binary Cross-Entropy (BCE) loss subjects to negative sampling numbers, thereby it is likely to ignore valuable negative examples and reduce the training efficiency; Cross-Entropy (CE) loss only focuses on the last timestamp of the training sequence, which causes low utilization of sequence information and results in inferior user sequence representation. To avoid these limitations, in this paper, we propose to calculate Cumulative Cross-Entropy (CCE) loss over the sequence. CCE is simple and direct, which enjoys the virtues of painless deployment, no negative sampling, and effective and efficient training. We conduct extensive experiments on five benchmark datasets to demonstrate the effectiveness and efficiency of CCE. The results show that employing CCE loss on three state-of-the-art models GRU4Rec, SASRec, and S3-Rec can reach 125.63%, 69.90%, and 33.24% average improvement of full ranking NDCG@5, respectively. Using CCE, the performance curve of the models on the test data increases rapidly with the wall clock time, and is superior to that of other loss functions in almost the whole process of model training.
translated by 谷歌翻译
With the development of technology and sharing economy, Airbnb as a famous short-term rental platform, has become the first choice for many young people to select. The issue of Airbnb's pricing has always been a problem worth studying. While the previous studies achieve promising results, there are exists deficiencies to solve. Such as, (1) the feature attributes of rental are not rich enough; (2) the research on rental text information is not deep enough; (3) there are few studies on predicting the rental price combined with the point of interest(POI) around the house. To address the above challenges, we proposes a multi-source information embedding(MSIE) model to predict the rental price of Airbnb. Specifically, we first selects the statistical feature to embed the original rental data. Secondly, we generates the word feature vector and emotional score combination of three different text information to form the text feature embedding. Thirdly, we uses the points of interest(POI) around the rental house information generates a variety of spatial network graphs, and learns the embedding of the network to obtain the spatial feature embedding. Finally, this paper combines the three modules into multi source rental representations, and uses the constructed fully connected neural network to predict the price. The analysis of the experimental results shows the effectiveness of our proposed model.
translated by 谷歌翻译
In the scenario of black-box adversarial attack, the target model's parameters are unknown, and the attacker aims to find a successful adversarial perturbation based on query feedback under a query budget. Due to the limited feedback information, existing query-based black-box attack methods often require many queries for attacking each benign example. To reduce query cost, we propose to utilize the feedback information across historical attacks, dubbed example-level adversarial transferability. Specifically, by treating the attack on each benign example as one task, we develop a meta-learning framework by training a meta-generator to produce perturbations conditioned on benign examples. When attacking a new benign example, the meta generator can be quickly fine-tuned based on the feedback information of the new task as well as a few historical attacks to produce effective perturbations. Moreover, since the meta-train procedure consumes many queries to learn a generalizable generator, we utilize model-level adversarial transferability to train the meta-generator on a white-box surrogate model, then transfer it to help the attack against the target model. The proposed framework with the two types of adversarial transferability can be naturally combined with any off-the-shelf query-based attack methods to boost their performance, which is verified by extensive experiments.
translated by 谷歌翻译
In this paper, we investigate the possibility of the backward-differential-flow-like algorithm which starts from the minimum of convexification version of the polynomial. We apply the heat evolution convexification approach through Gaussian filtering, which is actually an accumulation version of Steklov's regularization. We generalize the fingerprint theory which was proposed in the theory of computer vision by A.L. Yuille and T. Poggio in 1980s, in particular their fingerprint trajectory equation, to characterize the evolution of minimizers across the scale. On the other hand, we propose the "seesaw" polynomials $p(x|s)$ and we find a seesaw differential equation $\frac{\partial p(x|s)}{\,ds}=-\frac{1}{p''(x)}$ to characterize the evolution of global minimizer $x^*(s)$ of $p(x|s)$ while varying $s$. Essentially, both the fingerprints $\mathcal{FP}_2$ and $\mathcal{FP}_3$ of $p(x)$, consisting of the zeros of $\frac{\partial^2 p(x,t)}{\partial x^2}$ and $\frac{\partial^3 p(x,t)}{\partial x^3}$, respectively, are independent of seesaw coefficient $s$, upon which we define the Confinement Zone and Escape Zone. Meanwhile, varying $s$ will monotonically condition the location of global minimizer of $p(x|s)$, and all these location form the Attainable Zone. Based on these concepts, we prove that the global minimizer $x^*$ of $p(x)$ can be inversely evolved from the global minimizer of its convexification polynomial $p(x,t_0)$ if and only if $x^*$ is included in the Escape Zone. In particular, we give detailed analysis for quartic and six degree polynomials.
translated by 谷歌翻译
Inferring missing links or detecting spurious ones based on observed graphs, known as link prediction, is a long-standing challenge in graph data analysis. With the recent advances in deep learning, graph neural networks have been used for link prediction and have achieved state-of-the-art performance. Nevertheless, existing methods developed for this purpose are typically discriminative, computing features of local subgraphs around two neighboring nodes and predicting potential links between them from the perspective of subgraph classification. In this formalism, the selection of enclosing subgraphs and heuristic structural features for subgraph classification significantly affects the performance of the methods. To overcome this limitation, this paper proposes a novel and radically different link prediction algorithm based on the network reconstruction theory, called GraphLP. Instead of sampling positive and negative links and heuristically computing the features of their enclosing subgraphs, GraphLP utilizes the feature learning ability of deep-learning models to automatically extract the structural patterns of graphs for link prediction under the assumption that real-world graphs are not locally isolated. Moreover, GraphLP explores high-order connectivity patterns to utilize the hierarchical organizational structures of graphs for link prediction. Our experimental results on all common benchmark datasets from different applications demonstrate that the proposed method consistently outperforms other state-of-the-art methods. Unlike the discriminative neural network models used for link prediction, GraphLP is generative, which provides a new paradigm for neural-network-based link prediction.
translated by 谷歌翻译